Haplotype Motifs: An Algorithmic Approach to Locating Evolutionarily Conserved Patterns in Haploid Sequences
نویسنده
چکیده
The promise of plentiful data on common human genetic variations has given hope that we will be able to uncover genetic factors behind common diseases that have proven difficult to locate by prior methods. Much recent interest in this problem has focused on using haplotypes (contiguous regions of correlated genetic variations), instead of the isolated variations, in order to reduce the size of the statistical analysis problem. In order to most effectively use such variation data, we will need a better understanding of haplotype structure, including both the general principles underlying haplotype structure in the human population and the specific structures found in particular genetic regions or sub-populations. This paper presents a probabilistic model for analyzing haplotype structure in a population using conserved motifs found in statistically significant sub-populations. It describes the model and computational methods for deriving the predicted motif set and haplotype structure for a population. It further presents results on simulated data, in order to validate the method, and on two real datasets from the literature, in order to illustrate its practical application.
منابع مشابه
High Fuzzy Utility Based Frequent Patterns Mining Approach for Mobile Web Services Sequences
Nowadays high fuzzy utility based pattern mining is an emerging topic in data mining. It refers to discover all patterns having a high utility meeting a user-specified minimum high utility threshold. It comprises extracting patterns which are highly accessed in mobile web service sequences. Different from the traditional fuzzy approach, high fuzzy utility mining considers not only counts of mob...
متن کاملA graph theoretical approach for predicting common RNA secondary structure motifs including pseudoknots in unaligned sequences
MOTIVATION RNA structure motifs contained in mRNAs have been found to play important roles in regulating gene expression. However, identification of novel RNA regulatory motifs using computational methods has not been widely explored. Effective tools for predicting novel RNA regulatory motifs based on genomic sequences are needed. RESULTS We present a new method for predicting common RNA seco...
متن کاملIdentifying Property Based Sequence Motifs in Protein Families and Superfamilies: Application to DNase-1 Related Endonucleases
MOTIVATION Identification of short conserved sequence motifs common to a protein family or superfamily can be more useful than overall sequence similarity in suggesting the function of novel gene products. Locating motifs still requires expert knowledge, as automated methods using stringent criteria may not differentiate subtle similarities from statistical noise. RESULTS We have developed a ...
متن کاملPhylogenetic Analysis of Beta-Glucanase Producing Actinomycetes Strain TBG-CH22 - A Comparison of Conventional and Molecular Morphometric Approach
Actinomycetes are inexhaustible producers of commercially valuable metabolites, are continually screened for beneficial compounds. The taxonomic and phylogenetic study of novel actinomycetes strains are mostly based on conventional methods and primary DNA structure of 16s rRNA. Although 16s rRNA sequence is well accepted in phylogeny studies, its secondary structures have not been widely used. ...
متن کاملRab11 in Disease Progression
Membrane/ protein trafficking in the secretory/ biosynthetic and endocytic pathways is mediated by vesicles. Vesicle trafficking in eukaryotes is regulated by a class of small monomeric GTPases the Rab protein family. Rab proteins represent the largest branch of the Ras superfamily GTPases, and have been concerned in a variety of intracellular vesicle trafficking and different intracellular sig...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Proceedings. IEEE Computer Society Bioinformatics Conference
دوره 2 شماره
صفحات -
تاریخ انتشار 2003